Search CORE

7 research outputs found

Hizketa-ezagutzan oinarritutako estrategiak, euskarazko online OBHI (Ordenagailu Bidezko Hizkuntza Ikaskuntza) sistemetarako

Author: Odriozola Sustaeta Igor
Publication venue
Publication date: 03/05/2019
Field of study

211 p. (eng) 217 p. (eusk.)Tesi honetan, euskarazko hizketa-ezagutze automatikoaren bi inplementazio aztertzen dira, Ordenagailu Bidezko Hizkuntza Ikaskuntza (OBHI) sistemetarako: Ordenagailu Bidezko Ebakera Lanketa (OBEL) eta Ahozko Gramatika Praktika (AGP). OBEL sistema klasikoan, erabiltzaileari esaldi bat irakurrarazten zaio, eta fonema bakoitzerako puntuazio bat jasotzen du bueltan. AGPn, Hitzez Hitzeko Esaldi Egiaztapena (HHEE) teknika proposatu dugu, ariketak ebatzi ahala egiaztatzen dituen sistema. Bi sistemon oinarrian, esakuntza egiaztatzeko teknikak daude, Goodness of Pronunciation (GOP) puntuazioa, adibididez.Sistema horiek inplementatzeko, eredu akustikoak entrenatu behar dira, eta, horretarako, Basque Speecon-like datu-basea erabili dugu, euskararako publikoki erabilgarri dagoen datu-base bakarra. Eredu akustiko onak lortzearren, datu-basean egokitzapenak egin behar izan dira hiztegi alternatibadun bat sortuz, eta fasekako entrenamendua ere probatu da. % 12.21eko PER (fonemen errore-tasa) lortu da hala.Lehendabiziko sistema laborategiko baldintzetan testatu da, eta emaitza lehiakorrak lortu dira.Hala ere, tesi honetako OBEL eta AGP sistemen helburua da bezero/zerbitzari motako arkitektura batean ezartzea, ikasleek edonondik atzi dezaten. Hori ahalbidetzeko, HTML5eko zehaztapenak erabili dira audioa zerbitzarira grabatu ahala bidaltzeko, eta, gainera, onlineko batezbesteko- eta bariantza-normalizazio cepstraleko (CMVN, Cepstral Mean and Variance Normalisation) teknika berri bat proposatu da erabiltzaileek grabatutako audio-seinaleen kanal desberdintasunen eragina txikiagotzeko. Teknika hori tesi honetan aurkeztutako metodo batean oinarriturik dago: normalizazio anitzeko puntuatzea (MNS, Multi Normalization Scoring), eta onlineko ahots-aktibitatearen detektagailu (VAD, Voice Activity Detector) berri bat ere proposatu da metodo horretan oinarriturik. Azkenik, parametro desberdinak ebaluatu dira neurona-sareak erabiliz, eta ondorioztatu da GOP puntuazioa dela eraginkorrena

Archivo Digital para la Docencia y la Investigación

Hizketa-ezagutzan oinarritutako estrategiak, euskarazko online OBHI (Ordenagailu Bidezko Hizkuntza Ikaskuntza) sistemetarako

Author: Odriozola Sustaeta Igor
Publication venue
Publication date: 01/01/2019
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods

Author: Hernaez Rioja Inmaculada Concepción
Navas Cordón Eva
Odriozola Sustaeta Igor
Publication venue: 'Elsevier BV'
Publication date: 31/05/2018
Field of study

Preprint del artículo públicado online el 31 de mayo 2018Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.This work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla)

Archivo Digital para la Docencia y la Investigación

The observation likelihood of silence: analysis and prospects for VAD applications

Author: Hernaez Rioja Inmaculada Concepción
Navas Cordón Eva
Odriozola Sustaeta Igor
Serrano García Luis
Sánchez de la Fuente Jon
Publication venue: 'International Speech Communication Association'
Publication date: 23/11/2018
Field of study

This paper shows a research on the behaviour of the observa-tion likelihoods generated by the central state of asilenceHMM(Hidden Markov Model) trained for Automatic Speech Recog-nition (ASR) using cepstral mean and variance normalization(CMVN). We have seen that observation likelihood shows astable behaviour under different recording conditions, and thischaracteristic can be used to discriminate betweenspeechandsilenceframes. We present several experiments which provethat the mere use of a decision threshold produces robust re-sults for very different recording channels and noise conditions.The results have also been compared with those obtained by twostandard VAD systems, showing promising prospects. All in all,observation likelihood scores could be useful as the basis for thedevelopment of future VAD systems, with further research andanalysis to refine the results.This work has been partially supported by the EU(FEDER) under grant TEC2015-67163-C2-1-R (RESTORE)(MINECO/FEDER, UE) and by the Basque Government undergrant KK-2017/00043 (BerbaOla

Archivo Digital para la Docencia y la Investigación

Hizketa-ezagutzan oinarritutako estrategiak, euskarazko online OBHI (Ordenagailu Bidezko Hizkuntza Ikaskuntza) sistemetarako

Author: Odriozola Sustaeta Igor
Publication venue
Publication date: 03/05/2019
Field of study

Design and development of an automatic pronunciation evaluation system for Basque

Author: Hernáez Rioja Inmaculada
Hoffmann Rüdiger
Jokisch Oliver
Odriozola Sustaeta Igor
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2012
Field of study

En este artículo, se presentan los primeros pasos en el desarrollo de un sistema de enseñanza de la pronunciación asistida por ordenador (CAPT, Computer-Assisted Pronunciation Teaching) para el euskara. El punto de partida es un sistema estándar de reconocimiento automático del habla (ASR) basado en modelos ocultos de Markov (HMM) que maneja parámetros de confianza GOP (Goodness of Pronunciation) para la verificación de fonemas. Dicho ASR se integrará en AzAR, el software de entrenamiento de la pronunciación desarrollado para el alemán y varias lenguas eslavas. En este artículo se presentan los primeros pasos del diseño del currículum para el euskara, los problemas generados en la verificación por el uso de HMMs creados a partir de una base de datos de ASR, y algunos resultados iniciales.In this paper, the first steps of the development of a computer-assisted pronunciation teaching (CAPT) system for Basque are introduced. The baseline is a standard automatic speech recognition (ASR) system based on hidden Markov models (HMMs) that manages GOP (goodness of pronunciation) scores for phoneme verification. This ASR will be integrated into AzAR, the pronunciation training software developed for German and other Slavonic languages. This paper presents the initial steps in the design of the curriculum for Basque, some verification problems caused by the use of HMMs created from an ASR database, and some preliminary results

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Stem and ending division based treatment of Western Basque lexicon for dialectal speech recognition

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Odriozola Sustaeta Igor
Sánchez de la Fuente Jon
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2009
Field of study

En este artículo se presenta una primera aproximación para tratar el reconocimiento de habla dialectal en euskara, basada en la división de los elementos del diccionario en radicales y desinencias. De este modo se logra, por un lado, disminuir el tamaño del diccionario al tratar los casos gramaticales aglutinantes del euskara como un grupo finito de desinencias, y, por otro, tratar las diferentes variantes fonéticas y fonológicas que presentan dichos casos gramaticales en las distintas hablas pertenecientes al dialecto occidental. En este artículo se muestra el procedimiento seguido en los experimentos y los resultados obtenidos.In this paper a first approach based in the division of the dictionary elements into stems and endings is introduced to deal with Basque dialectal speech recognition. In this way, two objectives are achieved: on the one hand, the great dictionay decrease due to the treatment of the agglutinative grammatical cases of Basque as a finite group of endings; on the other hand, the treatment of the phonetic and phonological variants that show these grammatical cases in the different forms of the western dialect. In this paper, the procedure used in the experiments and the results obtained are shown.Este trabajo ha sido parcialmente financiado por el Ministerio de Educación y Ciencia dentro del proyecto AVIVAVOZ (TEC2006-13694-C03-02, www.avivavoz.es) y por el Gobierno Vasco en su subvención a grupos de investigación del sistema universitario vasco (IT-444-07)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas